|
MPEG Audio Layer-3
History
In 1987, the IIS started to work on
perceptual audio coding in the framework of the EUREKA project EU147,
Digital Audio Broadcasting (DAB). In a joint cooperation with the
University of Erlangen (Prof. Dieter Seitzer), the IIS finally devised a
very powerful algorithm that is standardized as ISO-MPEG Audio Layer-3 (IS
11172-3 and IS 13818-3). Without data reduction,
digital audio signals typically consist of 16 bit samples recorded at a
sampling rate more than twice the actual audio bandwidth (e.g. 44.1 kHz
for Compact Disks). So you end up with more than 1.400 Mbit to
represent just one second of stereo music in CD quality. By using
MPEG audio coding, you may shrink down the original sound data from a CD
by a factor of 12, without losing sound quality. Factors of 24 and even
more still maintain a sound quality that is significantly better than what
you get by just reducing the sampling rate and the resolution of your
samples. Basically, this is realized by perceptual coding
techniques addressing the perception of sound waves by the human
ear. Using MPEG audio, one may achieve a typical data
reduction of
1:4 |
by Layer 1 (corresponds
with 384 kbps for a stereo signal), |
1:6...1:8 |
by Layer 2 (corresponds
with 256..192 kbps for a stereo signal), |
1:10...1:12 |
by Layer 3 (corresponds
with 128..112 kbps for a stereo
signal), | still maintaining the original CD sound quality. By
exploiting stereo effects and by limiting the audio bandwidth, the coding
schemes may achieve an acceptable sound quality at even lower bitrates.
MPEG Layer-3 is the most powerful member of the MPEG audio coding family.
For a given sound quality level, it requires the lowest bitrate - or for a
given bitrate, it achieves the highest sound
quality.
Sound Quality
Some typical performance data of
MPEG Layer-3 are:
sound quality |
bandwidth |
mode |
bitrate |
reduction ratio |
telephone sound |
2.5 kHz |
mono |
8 kbps * |
96:1 |
better than shortwave |
4.5 kHz |
mono |
16 kbps |
48:1 |
better than AM radio |
7.5 kHz |
mono |
32 kbps |
24:1 |
similar to FM radio |
11 kHz |
stereo |
56...64 kbps |
26...24:1 |
near-CD |
15 kHz |
stereo |
96 kbps |
16:1 |
CD |
>15 kHz |
stereo |
112..128kbps |
14..12:1 |
*) Fraunhofer uses a
non-ISO extension of MPEG Layer-3 for enhanced performance ("MPEG
2.5") |
In all international listening tests,
MPEG Layer-3 impressively proved its superior performance, maintaining the
original sound quality at a data reduction of 1:12 (around 64 kbit/s per
audio channel). If applications may tolerate a limited bandwidth of around
10 kHz, a reasonable sound quality for stereo signals can be achieved even
at a reduction of 1:24. For the use of low bit-rate audio
coding schemes in broadcast applications at bitrates of 60 kbit/s per
audio channel, the ITU-R recommends MPEG Layer-3. (ITU-R doc.
BS.1115)
Details
Filter
bank The filter bank used in MPEG Layer-3 is a hybrid
filter bank which consists of a polyphase filter bank and a Modified
Discrete Cosine Transform (MDCT). This hybrid form was chosen for reasons
of compatibility to its predecessors, Layer-1 und
Layer-2. Perceptual Model The perceptual
model is mainly determining the quality of a given encoder implementation.
It uses either a seperate filter bank or combines the calculation of
energy values (for the masking calculations) and the main filter bank. The
output of the perceptual model consists of values for the masking
threshold or the allowed noise for each coder partition. If the
quantization noise can be kept below the masking threshold, then the
compression results should be indistinguishable from the original
signal. Joint Stereo Joint stereo coding
takes advatage of the fact that both channels of a stereo channel pair
contain far the same information. These stereophonic irrelevancies and
redundancies are exploited to reduce the total bitrate. Joint stereo is
used in cases where only low bitrates are available but stereo signals are
desired. Quantization and Coding A system
of two nested iteration loops is the common solution for quantization and
coding in a Layer-3 encoder. Quantization is done via a
power-law quantizer. In this way, larger values are automatically coded
with less accuracy and some noise shaping is already built into the
quantization process. The quantized values are coded by
Huffman coding. As a specific method for entropy coding, hufman coding is
lossless. Thus is called noiseless coding because no noise is added to the
audio signal. The process to find the optimum gain and
scalefactors for a given block, bit-rate and output from the perceptual
model is usually done by two nested iteration loops in an
analysis-by-synthesis way:
- Inner iteration loop (rate
loop)
The Huffman code tables assign shorter code words
to (more frequent) smaller quantized values. If the number of bits
resulting from the coding operation exceeds the number of bits available
to code a given block of data, this can be corrected by adjusting the
global gain to result in a larger quantization step size, leading to
smaller quantized values. This operation is repeated with different
quantization step sizes until the resulting bit demand for Huffman
coding is small enough. The loop is called rate loop because it modifies
the overall coder rate until it is small enough.
- Outer iteration loop (noise
control/distortion loop)
To shape the quantization
noise according to the masking threshold, scalefactors are applied to
each scalefactor band. The systems starts with a default factor of 1.0
for each band. If the quantization noise in a given band is found to
exceed the masking threshold (allowed noise) as supplied by the
perceptual model, the scalefactor for this band is adjusted to reduce
the quantization noise. Since achieving a smaller quantization noise
requires a larger number of quantization steps and thus a higher
bitrate, the rate adjustment loop has to be repeated every time new
scalefactors are used. In other words, the rate loop is nested within
the noise control loop. The outer (noise control) loop is executed until
the actual noise (computed from the difference of the original spectral
values minus the quantized spectral values) is below the masking
threshold for every scalefactor band (i.e. critical band).
|
|